Matching Attributes across Overlapping Heterogeneous Data Sources Using Mutual Information

نویسنده

  • Huimin Zhao
چکیده

Identifying matching attributes across heterogeneous data sources is a critical and time-consuming step in integrating the data sources. In this paper, the author proposes a method for matching the most frequently encountered types of attributes across overlapping heterogeneous data sources. The author uses mutual information as a unified measure of dependence on various types of attributes. An example is used to demonstrate the utility of the proposed method, which is useful in developing practical attribute matching tools.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Weighted Multi-Attribute Method for Matching User-Generated Points of Interest

To a large degree, the attraction of Big Data lies in the variety of its heterogeneous multi-thematic and multidimensional data sources and not merely its volume. To fully exploit this variety, however, requires conflation. This is a two step process. First, one has to establish identity relations between information entities across the different data sources; and second, attribute values have ...

متن کامل

Toward the Scalable Integration of Internet

This dissertation in a broad sense focuses on understanding the fundamental aspects of building a large-scale information integration system that can answer complex queries over a large number of heterogeneous Internet data sources. Among many challenges in achieving this goal, we focus on two key issues: efficient query processing and schema matching. Most of the data the integration system pr...

متن کامل

An Attribute–driven Approach for Image Registration Using Road Networks

Geospatial analysis is becoming increasingly dependent on the integration of data from heterogeneous sources. In this paper, we present an automated, feature-based approach for geometric co-registration using networks of roads (or other similar features). This approach is based on a graph matching scheme that models networks as graphs with embedded invariant attributes. The main advantages of o...

متن کامل

Breaking the Deadlock: Simultaneously Discovering Attribute Matching and Cluster Matching with Multi-Objective Simulated Annealing

In this paper, we present a data mining approach to challenges in the matching and integration of heterogeneous datasets. In particular, we propose solutions to two problems that arise in combining information from different results of scientific research. The first problem, attribute matching, involves discovery of correspondences among distinct numeric-typed summary features (“attributes”) th...

متن کامل

Sub-Merge: Diving Down to the Attribute-Value Level in Statistical Schema Matching

Matching and merging data from conflicting sources is the bread and butter of data integration, which drives search verticals, e-commerce comparison sites and cyber intelligence. Schema matching lifts data integration—traditionally focused on well-structured data—to highly heterogeneous sources. While schema matching has enjoyed significant success in matching data attributes, inconsistencies c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • J. Database Manag.

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2010